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Editor’s Notes 

Provenance 

Two Second World War research papers by Alan Turing were declassified re¬ 
cently. The papers, The Applications of Probability to Cryptography and its shorter 
companion Paper on Statistics of Repetitions, are available from from the National 
Archives in the UK at www.nationalarchives.gov.uk. 

The released papers give the full text, along with figures and tables, and pro¬ 
vide a fascinating insight into the preparation of the manuscripts, as well as the 
style of writing at a time when typographical errors were corrected by hand, and 
mathematical expression handwritten into spaces left in the text. 

Working with the papers in their original format provides some challenges, so 
they have been typeset for easier reading and access. We recommend that the 
typeset versions are read with a copy of the original manuscript at hand. 

This document contains the text and figures for The Applications of Probability 
to Cryptography, the companion paper is also available in typeset form from arXiv 
at WWW. arxiv.org/abs/1505.04715, These notes apply to both documents. 

Separately, a journal article by ZabelQ provides an analysis of the papers and 
further background information. 

The text 

It is not our intent to cast Alan Turing’s manuscripts into a journal style 
article, but more to provide clearer access to his writing and, perhaps, to answer 
the questions “If Turing had have had access to typesetting software, what would 
his papers have looked like?”. Consequently no “house-style” copy-editing has been 
imposed. Occasional punctuation has been added to improve readability, some 
obvious errors corrected, and paragraph breaks added to ease the reading of long 
text blocks - and occasionally to give a better text flow. Turing uses typewriter 
underlining, single, and double quotes to indicate emphasis or style; these have 
been implemented using font format changes, double quotes are used as needed. 

The manuscript has many typographical errors, deletions, and substitutions, 
all of which are indicated by over-typing, crossed out items, and handwritten pencil 
or ink annotations. These corrections have been implemented in this document to 
give the text that we presume Turing intended. Additionally, there are some hand 
written notes in the manuscript, which may or may not be by Turing; these are 
indicated by the use of footnotes. 

British English spelling is used in the manuscript and this is retained, so words 
such as favour, neighbourhood, cancelling, etc. will be encountered. Turing appears 
to favour the spellings bigramme, trigramme, tetragramme, etc., although he is not 
always consistent; throughout this document the favoured rendering is used. 

Turing’s wording is unchanged to give the full flavour of his original style. 

This means that “That is to say that we suppose for instance that . ” will be 

encountered - amongst others! 

Both papers end abruptly, no summary or conclusion is offered, perhaps the 
papers are incomplete or pages are missing. To indicate the end of the manuscript 
we have marked the end of each paper with a printing sign - an infinity symbol 
between two horizontal bars. 


^Zabell, S. 2012. “Commentary on Alan M.Turing: The Applications of Probability to 
Cryptography” Cryptologia, 36:191-214. 
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In the section on a letter subtractor problem, reference is made to other meth¬ 
ods to be discussed later in the paper. This does not happen - perhaps another 
indicator of an incomplete paper or missing pages. 

Finally, Turing uses some forward page references that appear in the manuscript 
as see(p ), obviously intending to return and complete the reference. This also does 
not happen, so these references remain unresolved. 

In short, we strive to represent Turing’s text as he wrote it. 

Ciphertext, cleartext, etc. 

In an attempt to capture the flavour of the time, ciphertext, cleartext, keys, 
etc. are displayed in a fixed pitch, bold, non-serif font to represent the type¬ 
writer, teletype, and telegraph machines that would have printed the original code, 
viz. CONDITIONS. 

Mathematics 

In the manuscript all mathematics is hand written in ink and pencil in spaces 
left between the typed text. Sometimes adequate space was left, other time not, 
and the handwriting spills into margins and adjacent lines, adding to the reading 
challenge. We have cast all mathematics into standard in-line or display formats 
as appropriate. We have used the mathcal font in places to capture the flavour of 
Turing’s handwriting, e.g. “the probability p” appears as “the probability V” . 

Turing uses no punctuation in his mathematics, this has been added to be con¬ 
sistent with modern practic^ he also uses letters to reference equations - numbers 
are used in this document. In many places we have added parentheses to give clar¬ 
ity to an expression, and in some places where Turing is inconsistent in his uses of 
parentheses for a mathematical phrase (the expression for letter probability in the 
Vigenere in particular) we have chosen one format and been consistent in its use. 

As Turing demonstrates a love of dense mathematics the algebraic multiplica¬ 
tion symbol x has occasionally been used for readability, so all standard forms of 
multiplication will be encountered, viz., ab, a x b,a ■ b. Finally, convention suggests 
that the subject of a formula or expression sits on its own on the left hand side 
of the equals sign, with the subsidiary variables collected on the right hand side. 
Turing adheres to this convention as it suits him, his preference is retained. 

In short, we strive to retain the elegance of Turing’s mathematics, whilst casting 
it into a modern format. 

Figures and tables 

All figures have been included with rearrangement of some items to improve 
clarity or document flow. Turing uses a variety of papers, styles, inks, pen, and 
pencil; these have all been represented in standard figure and table format. 

Contents page 

Turing provides a rudimentary Contents for The Applications of Probability to 
Cryptography, this has been reworked with some additions to make it more mean¬ 
ingful. Paper on Statistics of Repetitions, being much shorter, requires no Contents. 

Editors 

The editor can be contacted at: ian.taylor@maths.oxon.org. 


^See, for instance, Higham, Nicholas J. 1998. “Handbook of writing for the Mathematical 
Sciences”, SIAM, Philadelphia. 
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CHAPTER 1 


Introduction 


1.1. Preamble 

The theory of probability may be used in cryptography with most effect when 
the type of cipher used is already fully understood, and it only remains to find the 
actual keys. It is of rather less value when one is trying to diagnose the type of 
cipher, but if definite rival theories about the type of cipher are suggested it may 
be used to decide between them. 

1.2. Meaning of probability and odds 

I shall not attempt to give a systematic account of the theory of probability, 
but it may be worth while to dehne shortly probability and odds. The probability 
of an event on certain evidence is the proportion of cases in which that event may 
be expected to happen given that evidence. For instance if it is known the 20% of 
men live to the age of 70, then knowing of Hitler only Hitler is a man we can say 
that the probability of Hitler living to the age of 70 is 0.2. Suppose that we know 
that Hitler is now of age 52 the probability will be quite different, say 0.5, because 
50% of men of 52 live to 70. 

The odds of an event happening is the ratio Vl{l—V) where V is the probability 
of it happening. This terminology is connected with the common phraseology odds 
of 5:2 on meaning in our terminology that the odds are 5/2. 

1.3. Probabilities based on part of the evidence 

When the whole evidence about some event is taken into account it may be 
extremely difficult to estimate the probability of the event, even very approximately, 
and it may be better to form an estimate based on a part of the evidence, so that 
the probability may be more easily calculated. This happens in cryptography in 
a very obvious way. The whole evidence when we are trying to solve a cipher is 
the complete traffic, and the events in question are the different possible keys, and 
functions of the keys. Unless the traffic is very small indeed the theoretical answer 
to the problem “ What are the probabilities of the various keys? ” will be of the form 
“ The key ... has a probability differing almost imperceptibly from 1 (certainty) 
and the other keys are virtually impossible”. But a direct attempt to determine 
these probabilities would obviously not be a practical method. 

1.4. A priori probabilities 

The evidence concerning the possibility of an event occurring usually divides 
into a part about which statistics are available, or some mathematical method can 
be applied, and a less definite part about which one can only use one’s judgement. 
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1. INTRODUCTION 


Suppose for example that a new kind of traffic has turned up and that only three 
messages are available. Each message has the letter V in the 17th place and G in 
the 18th place. We want to know the probability that it is a general rule that we 
should hnd V and G in these places. We first have to decide how probable it is that 
a cipher would have such a rule, and as regards this one can probably only guess, 
and my guess would be about 1/5,000, 000. This judgement is not entirely a guess; 
some rather insecure mathematical reasoning has gone into it, something like this:- 

The chance of there being a rule that two consecutive letters somewhere after 
the 10th should have certain fixed values seems to be about 1/500 (this is a complete 
guess). The chance of the letters being the 17th and 18th is about 1/15 (another 
guess, but not quite as much in the air). The probability of a letter being V or 
G is 1/676 (hardly a guess at all, but expressing a judgement that there is no 
special virtue in the bigramme VG). Hence the chance is 1/(500 x 15 x 676) or 
about 1/5, 000, 000. This is however all so vague, that it is more usual to make the 
judgment “1/5,000,000” without explanation. 

The question as to what is the chance of having a rule of this kind might of 
course be resolved by statistics of some kind, but there is no point in having this 
very accurate, and of course the experience of the cryptographer itself forms a kind 
of statistics. 

The remainder of the problem is then solved quite mathematically. Let us 
consider a large number of ciphers chosen at random. N of them say. Of these 
iV/5, 000,000 of them will have the rule in question, and the remainder not. Now if 
we had three messages of each of the ciphers before us, we should Hnd that for each 
of the ciphers with the rule, three messages have VG in the required place, but of 
the remaining (4,999,999 x iV)/5,000,000 only a proportion 1/676^ will have them. 
Rejecting the ciphers which have not the required characteristics we are left with 
iV/5, 000,000 cases where the rule holds, and (4,999, 999 x A^) /(5, 000, 000 x 676^) 
cases where it does not. This selection of ciphers is a random selection of ones 
which have all the known characteristics of the one in question, and therefore the 
odds in favour of the rule holding are: 

N ^ 4,999,999 x N 
5,000,000 ■ 5,000,000 x 6763’ 
i.e 676^ : 4,999,9999, 
or about 60 : 1 on. 

It should be noticed that the whole argument is to some extent fallacious, as it is 
assumed that there are only two possibilities, viz. that either VG must always occur 
in that position, or else that the letters in the 17th and 18th positions are wholly 
random. There are however many other possibilities worth consideration, e.g. 

(1) On the day in question we have VG in the position in question. 

(2) Or on another day we have some other fixed pair of letters. 

(3) Or in the positions 17, 18 we have to have one of the four combinations 
VG, RH, OM, IL and by chance VG has been chosen for all the three messages 
we have had. 

(4) Or the cipher is a simple substitution and VG is the substitute of some 
common bigramme, say TH. 

The possibilities are of course endless, and it is therefore always necessary to 
bear in mind the possibility of there being other theories not yet suggested. 
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The a priori probability sometimes has to be estimated as above by some sort 
of guesswork, but often the situation is more satisfactory. Suppose for example 
that we know that a certain cipher is a simple substitution, the keys having no 
specially noticeable properties. Suppose also that we have 50 letters of such a 
message including hve occurrences of P. We want to know how probable it it that 
P is the substitute of E. As before we have to answer two questions. 

(1) How likely is it that P would be the substitute of E neglecting the evidence 
of the five Es occurring in the message? 

(2) How likely are we to get 5 Ps? 

(a) If P is not the substitute of E 

(b) If P is the substitute of E. 

I will not attempt to answer the second question for the present. The answer 
to the first is simply that the probability of a letter being the substitute of E is 
independent of what the letter is, and is therefore always 1/26, in particular it is 
1/26 for the letter P. The only guesswork here is the judgement that the keys are 
chosen at random. 


1.5. The Factor Principle 

Nearly all applications of probability to cryptography depend on the factor 
principle (or Bayes’ Theorem). This principle may first be illustrated by a simple 
example. Suppose that one man in five dies of heart failure, and that of the men 
who die of heart failure two in three die in their beds, but of the men who die from 
other causes only one in four dies in their beds. (My facts are no doubt hopelessly 
inaccurate). Now suppose we know that a certain man died in his bed. What is 
the probability that he died of heart failure? Of all numbering N say we find that 


Afx(l/5) X (2/3) 
Afx(l/5 X (1/3) 
Afx(4/5) X (1/4) 
Afx(4/5 X (3/4) 


die in their beds of heart failure 

... elsewhere . 

die in their beds from other causes 
... elsewhere . 


Now as our man died in his bed we do not need to consider the cases of men who 
did not die in their beds, and these consist of 


iVx(l/5) X (2/3) cases of heart failure and 

iVx(4/5) X (1/4) from other causes 

and therefore the odds are 1 x (2/3) : 4 x (l/4)in favour of heart failure. If this had 
been done algebraically the result would have been 


A posteriori odds of the theory 

= A priori odds of the theory 
Probability of the data being fulfilled if the theory is true 
Probability of the data being fulfilled if the theory is false 


In this the theory is that the man died of heart failure, and the data is that he died 
in his bed. 
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1. INTRODUCTION 


The general formula above will be described as the factor principle, the ratio 
Probability of the data if the theory is true 
Probability of the data if the theory is false' 

is called the factor for the theory on account of the data. 

1.6. Decibanage 

Usually when we are estimating the probability of a theory there will be several 
independent pieces of evidence e.g. following our last example, where we want to 
know whether a certain man died of heart failure or not, we may know 

(1) He died in his bed 

(2) His father died of heart failure 

(3) His bedroom was on the ground floor 
and also have statistics telling us 

(a) 2/3 of men who die of heart failure die in their beds 

(b) 2/5.have fathers who died of heart failure 

(c) 1/2.have bedroom on the ground floor 

(d) 1/4 of men who died from other causes die in their beds 

(e) 1/6.have fathers who died of heart failure 

(f) 1/20 of men who die of other cause have their bedrooms on the ground floor 
Let us suppose that the three pieces of evidence are independent of one another 

if we know that he died of heart failure, and also if we know that he did not die 
of heart failure. That is to say that we suppose for instance that knowing that he 
slept on the ground floor does not make it any more likely that he died in his bed if 
we knew all along that he died of heart failure. When we make these assumptions 
the probability of a man who died of heart failure satisfying all three conditions 
is obtained simply by multiplication, and is (2/3) x (2/5) x (1/2) and likewise for 
those who died from other causes the probability is (1/4) x (1/6) x (1/20), and the 
factor in favour of the heart theory failure is 

(2/3) X (2/5) X (1/2) 

(1/4) X (1/6) X (1/20)- 

We may regard this as the product of three factors (2/3)/(l/4) and (2/5)/(l/6) 
and (l/2)/(l/20) arising from from the three independent pieces of evidence. Prod¬ 
ucts like this arise very frequently, and sometimes one will get products involving 
thousands of factors, and large groups of these factors may be equal. We naturally 
therefore work in terms of the logarithms of the factors. The logarithm of the fac¬ 
tor, taken to the base 10^/^° is called decibanage in favour of the theory. A deeiban 
is a unit of evidence; a piece of evidence is worth a deeiban if it increase the odds 
of the theory in the ratio 10^/^° : 1. The deeiban is used as a more convenient 
unit that the ban. The terminology was introduced in honor of the famous town of 
Banbury. 

Using this terminology we might say that the fact that our man died in bed 
scores 4.3 decibans in favour of the heart failure theory (101og(8/3) = 4.3). We 
score a further 3.8 decibans for his father dying of heart failure, and 10 for his 
having his bedroom on the ground floor, totalling 18.1 decibans. We then bring in 
the a priori odds 1/4 or 10“®/^° and the result is the the odds are 10^^'^/^°, or as 
we may say “12.1 deeiban up on evens”. This means about 16:1 on. 






CHAPTER 2 


Straightforward Cryptographic Problems 


2.1. Vigenere 


The factor principle can be applied to the solutions of a Vigenere problem 
with great effect. I will assume here that the period of the cipher has already been 
determined. Probability theory may be applied to this part of the problem also, but 
that is not so elementary. Suppose our cipher, written out in its correct period i^ 

DKQHSHZNMP 

RCVXUHTEAQ 

XHPUEPPSBK 

TWUJAGDYOJ 

THWCYDZHGA 

PZKOXOEYAE 

BOKBUBPIKR 

WWACEJPHLP 

TUZYFHLRYC 

Figure 1. Vigenere problem. 

(It is only by chance that it makes a rectangular array.) 


Let us try to find the key for the first column, and for the moment let us only take 
into account the evidence afforded by the first letter D. Let us first consider the key 
B. The factor principle tells us 


Odds in favour of key B = A priori odds in favour of key B 

Probability of getting D in cipher if key is B 
Probability of getting key D in cipher if key is not B 

Now the a priori odds in favour of key B may be taken as 1/25. The probability 
of getting D in the cipher with the key B is just the probability of getting C in the 
clear which (using the count on 1000 letters in Fig 2) is 0.021. If however the key 
is not B we can have any letter other the C in the clear, and the probability is 
(1 - 0.021)/25. Using the evidence of the D then the odds in favour of the key B are 

1 ^ /25 X 0.021\ 

^ ^ 1 - 0.021 ) ■ 


^ Turing’s statement of the ciphertext is slightly different to what he decodes. The N M at 
the end the first line are reversed to read DKQHSHZMNP in Fig 5, which gives the correct cleartext. 
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2. STRAIGHTFORWARD CRYPTOGRAPHIC PROBLEMS 


We may then consider the effect of the next letter in the column R which gives a 
further factor of (25 x 0.064)/(l - 0.064). We are here assuming that the evidence 
of the R is independent of the evidence of the D. This is not quite correct, but is 
a useful approximation; a more accurate method of calculation will be given later. 
Let us write Va for the frequency of the letter a in plain language. Then our final 
estimate for the odds in favour of key B is 

1 TT 257^q;_i 
1-iPa -i’ 

I 

where ai,a 2 , - ■ ■ is the series of letters in the 1st column, and we use the letters 
and numbers interchangeably, A meaning 1, B meaning 2, ..., Z meaning 26 or 0. 
More generally for key /3 the odds are 

1 TT 257^Q,i-/3+i 
^ 1 - ■ 

The value of this can be calculated by having a table of the decibanage correspond¬ 
ing to the factors 257 ^q/(1 — Pa)- One then decodes the column with the various 
possible keys, looks up the decibanage, and adds them up. 

The most convenient form for doing this is a table of values of 20 logiQ[257^a/(1— 
Va)], taken to the nearest integer, or as we may say, the values of the score in half 
decibans. One may also have columns showing multiples of these, and the table 
made of double heigh! ^ (Fig [^. For the first column with key B the decoded 
column is CQWS**DAvj^ and we score -5 for C, -26 for Q, -5 for W, 17 for the three 
letters S, 5 for 0, 7 for A and -10 V, totalling -17. These calculations can be done 
very quickly by the use of the transparent gadget Fig|^, in which squares are ringed 
in pencil to show the number of letters occurring in the column. 


A 

84 

J 

2 

S 

73 

B 

23 

K 

5 

T 

81 

C 

21 

L 

38 

U 

19 

D 

46 

M 

34 

V 

11 

E 

116 

N 

66 

w 

21 

F 

20 

0 

66 

X 

16 

G 

25 

P 

15 

Y 

24 

H 

49 

Q 

2 

Z 

3 

I 

76 

R 

64 




Figure 2. Count on 1000 letters. 

(English text) 

The value for X has been taken more of less at random as a compromise 
between real language & telegraphese. Also I added to each entry (see p 


^ Turing provides a table of double height for Fig 3 to allow the “gadget” of Figure 4 to be 
used with any letter of the alphabet as a decode key - hence the double alphabet. Figure 4 can be 
prepared as a transparency, with the original markings cleared, and markings for the new decode 
letter added. Fig 3 and Fig 4 are correctly proportioned in this document for this to work. 

^ means SSS, for a total of three letter S, as noted in the following arithmetic. The linear 
decode for the example is CQWSSOAVS 

Forward reference left unresolved in the manuscript. 
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20 

13 
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A 

-23 

-18 

-14 

-9 

-5 

B 

-26 

-21 

-16 

-10 

-5 

C 

7 

6 

4 

3 

1 

D 

48 

38 

29 

19 

10 

E 

-28 

-22 

-17 

-11 

-6 

F 

-19 

-15 

-11 

-8 

-4 

G 

10 

8 

6 

4 

2 

H 

29 

23 

17 

12 

6 

1 

-131 

-103 

-77 

-52 

-26 
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-99 

-79 

-59 

-40 

-20 
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-2 

-2 

-1 

-1 

0 

L 

-6 

-5 

-4 

-2 

-1 

M 

23 

18 

14 

9 

5 

N 

23 

18 

14 

9 

5 

0 

-41 

-33 

-25 

-16 

-8 
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-131 

-103 

-77 

-52 

-26 

Q 

22 

18 

13 

9 

4 

R 

28 

22 

17 

11 

6 

5 

32 

26 

19 

13 

6 

T 

-31 

-25 

-19 

-12 

-6 

U 

-54 

-43 

-32 

-22 

-10 

V 

-26 

-21 

-16 

-10 

-5 

w 

-38 

-30 

-23 

-15 

-8 

X 

-20 

-16 

-12 

-8 

-4 

Y 

-111 

-89 

-67 

-44 

-22 

z 

31 

26 

20 

13 

7 

A 

-23 

-18 

-14 

-9 

-5 

B 

-26 

-21 

-16 

-10 

-5 

C 

7 

6 

4 

3 

1 

D 

48 

38 

29 

19 

10 

E 

-28 

-22 

-17 

-11 

-6 

F 

-19 

-15 

-11 

-8 

-4 

G 

10 

8 

6 

4 

2 

H 

29 

23 

17 

12 

6 

1 

-131 

-103 

-77 

-52 

-26 

J 

-99 

-79 

-59 

-40 

-20 

K 

-2 

-2 

-1 

-1 

0 

L 

-6 

-5 

-4 

-2 

-1 

M 

23 

18 

14 

9 

5 

N 

23 

18 

14 

9 

5 

0 

-41 

-33 

-25 

-16 

-8 

P 

-131 

-103 

-77 

-52 

-26 

Q 

22 

18 

13 

9 

4 

R 

28 

22 

17 

11 

6 

5 

32 

26 

19 

13 

6 

T 

-31 

-25 

-19 

-12 

-6 

U 

-54 

-43 

-32 

-22 

-10 

V 

-26 

-21 

-16 

-10 

-5 

w 

-38 

-30 

-23 

-15 

-8 

X 

-20 

-16 

-12 

-8 

-4 

Y 

-111 

-89 

-67 

-44 

-22 

z 


Figure 3. Table for scoring a Vigenere. 
In units of half a deciban. 
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Figure 4. Apparatus for scoring a Vigenere. 

Pencil marks arranged for 1st wheel of Fig. 1. 

The gadget may be placed over Fig in various positions corresponding to the 
various keys. The score is obtained by adding up the numbers showing through 
the various squares. In Fig the alphabet has been written in a vertical below the 
cipher text of Fig 1, each letter representing a possible key. The score for each key 
has been written opposite the key, and under the relevant column. An X denotes 
a bad score, not worth adding up. Usually these will be -15 or worse. It will be 
seen that for the first column P, having a score of 43 is extremely likely to be right, 
especially as there is no other score better than 8. If we neglect this latter fact 
the odds for the key are (1/25)10^'^® i.e. about 5:1 on. The effect of decoding this 
column with key P has been shown underneath. 

For the second column the best key is 0, but is by no means so certain as the first 
column. The decode for this column is also shown, and provides very satisfactory 
combinations with the first column, confirming both keys. (This confirmation could 
also be based on probability theory, given a table of bigramme frequencies). In the 
third column I and C are best although D would be very possible, and in the fourth 
column Q and U are best. 
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Figure 5. Scoring and solving a Vigenere. 

Writing down the possible decodes we see that the first line must read OWING 
and this makes the other lines read CONDI, ITHAS, EIMPO, ETOIM, ALCUL, MACHI, 
HISIS, EGRET. By filling in the word CONDITIONS the whole can now be decoded}^ 

® Solution: Keylength - 10, Key - PQIUMOLQNY, Cleartext - QWINGTQWAR CONDITIONS 
ITHASBECOM EIMPOSSIBL ETOIMPORTC ALCULATING MACHINESXT HISISVERYR EGRETTTABLE 
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A more accurate argument would run as follows. For the first column, instead 
of setting up as rival theories the two possibilities that B is the key and that B is 
not we can set up 26 rival theories that the key is A or B or ... Z, and we may apply 
the factor principle in the form:- 

A posteriori probability of key A 

A priori probability of key A x Probability of getting the given column with key A ’ 

A posteriori probability of key B 

A priori probability of key B x Probability of getting the given column with key B' 

= etc. 

The argument to justify this form of factor principle is really the same as for the 
original form. Let qp be the a priori probability of key f3. Then out of N cases we 
have Nq/s cases of key /3. Let V {(3, C) be the probability of getting the column C 
with key /3, then we have rejected the cases where we get columns other than C we 
find that there are NqpV {/3,C) cases of key (3 i.e. the a posteriori probability of 
key [3 is KNqpV (/3, C), where K is independent of j3. 

We have therefore to calculate the probability of getting the column C with 
key f3 and this is simply Ili "^(Qi-zS+i)! product of the frequencies of the 

decode letters which we get if the key is (3. 

Since the a priori probabilities of the keys are all equal we may say that the a 
posteriori probabilities are in the ratio Va.-p+i i.e. in the ratio 26 Vai-i 3 +i 
which is more convenient for calculation. The final value for the probability is then 

n 26 'Pq,._/3+i 
i 

En26ip„,_/5+i' 

/3 i 

The calculation of the product 2Q'Pai-p+i may be done by the method recom¬ 
mended before for 

-pr 2f)Pai-P+l 

(The table in Fig 3 was in fact made up for Hi ‘^^'Pai-p+i- The differences between 
the two tables would of course be rather slight). The new result is more accurate 
than the old because of the independence assumption in the original result. 

If we only want to know the ratios of the probabilities of the various keys there 
is no need to calculate the denominator ]([j 2&Vai-p+i- This denominator has 
however another importance: it gives us some evidence about other assumptions, 
such as that the cipher is Vigenere, and that the period is 10. This aspect will be 
dealt with later (p. ^ 

2.2. A letter subtracter problem 

A substitution with the period 91 x 95 x 99 is obtained by superimposing 
three substitutions of periods 91, 95, and 99, each substitution being a Vigenere 
composed of slides of 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9^ The three substitutions are 
known in detail, but we do not know for any given message at what point in the 
complete substitution to begin. For many messages however we can provide a more 
or less probable crib. How can we test the probability of a crib before attempting to 


® Forward reference left unresolved in the manuscript. 
^ Equivalent to keys A to J. 
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solve it? It may be assumed that approximately equal numbers of slides 0, 1, ..., 9 
occur in each substitution. 

The principle of the calculation is that owing to the way in which the substi¬ 
tution is built up, not all slides are equally frequent, e.g. a slide of 25 can only be 
the sum of slides of 9, 8, and 8, or 9, 9, and 7 whilst a slide of 15 can be any of the 
following 


9,6,0 

8,7,0 

7,7,1 

6,6,3 

9,5,1 

8 ,6,1 

7,6,2 

6,5,4 

9,4,2 

8,5,2 

7,5,3 


9,3,3 

8,4,3 

7,4,4 



A crib will therefore, other things being equal, be more likely if it requires a slide 
of 15 than if it requires a slide of 25. The problem is to make the best use of this 
principle, by determining the probability of the crib with reasonable accuracy, but 
without spending long over it. 

We have to find the probability of getting a given slide. To do this we can 
apply several methods. 

(a) We can produce a long stretch of key by addition and take a count of the 
resulting slides. This is obviously a very general method, and requires no special 
mathematical technique. It may be rather laborious, but by interpreting a small 
count with common sense one can probably get quite good results. 

(b) There are 1000 possible combinations of slides all equally likely viz. 000, 001, 
..., 999. We can add up the digits in these and take the remainder on division 
by 26, and then count the number of combinations giving each of the possible 
remainders. 

(c) We can make use of a trick which might appear to be rather special, but is 
really applicable to a multitude of problems. Consider the expression 

f{x) = {l + x + x^ -\ -h x^)^ . 

For each possible way of expressing a number n as the sum of three numbers 
0, ..., 9, say n = rrii + m 2 + m 3 , there is a term x"^^x^^x"^^ in f{x), x"^^ 
coming out of the first factor, out of the second, and out of the third. 
Hence the number of ways of expressing n in the form n = mi -I- m 2 + m 3 , is 
the coefficient of x" in /(x) i.e. in 

( 1 -xio)' 

( 1 -^f ’ 

or in 

(1 - 3 x 1 ° 3^20 _ ^ 30 ) _ 

Expanding (1 — x) ^ by the binomial theorem 

(1 — x)~° = 1 -I- 3x -I- 6 x^ -I- 10x° -I- 15x^ -I- 21x° -I- 28x° -I- 36x1' 

-f 45x® -f 55x° -f 66x1° 9 ^^i 2 ^ 95^13 

-f 120 x 1 ^ -f 136x1° -f 153x1° -f ITlxH' -f 190xi° 

-f 210x1° -f 231x°° -f 253x°i -f 276x°° -f 300x°° 

-f 325x°‘i -f 351x°° -f 378x°° -f 406x°i’ -f 435x°® -f .... 
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Now multiply by 1 — and we get 

f{x) = 1 + 3a; + 6 a;^ + lOa;^ + 15a;"‘ + 21a;® + 28a;® + 36x^ 

+ 45a;® + 55a;^ + 63a;i° + 69a;^i + 73x^^ + 75x^^ 

+ 75a;i'‘ + 73a;^® + 69a;^® + 63a;i^ + SSa;^® 

+ 45a;i® + 36a;20 + 28a;^i + 21x^^ + ISa;^® 

+ IQx^'^ + 6a;2® + 33;^® + 

This means to say that the chances of getting totals 0, 1, 2, ... are in the ratio 
1, 3, 6 , 10, ... The chances of getting remainders of 0, 1, 2, ... on division 
by 26 are in the ration 4, 4, 6 , 10, 15, ... To get true probabilities these must 
be divided by their total which is conveniently 1000 . 

(d) There are two other methods, both connected with the last method but not 
relyiM so much on the special features of the problem. They will be discussed 
later 0 

Suppose then that the probabilities have been calculated by one method or the 
other (as in fact we have done under (c)). We can then estimate the values of cribs. 
Let us suppose that a possible crib for a message beginning MVHWUSXOWBVMMK was 
AMBASSADOR so that the slides were 12, 9, 6 , 22, 2, 0, 23, 11, 14. The slide of 12 gives 
us some slight evidence in favour of the crib being right for slides of 12 occur with 
frequency 0.073 with right cribs, whilst with wrong cribs they occur with frequency 
only 1/26. The factor in favour of the crib is therefore 26 x 0.073 or about 1.9. 
A similar calculation may be made for each of the slides, but of course the work 
may be greatly speeded up by having the values of the factors 26 Cs/lOOO in half 
decibans tabulated: here Cg is the coefficient of x® in the above polynomial /(x). 
The table is given below (Fig 6 ) 


1 

0 

-20 

2 

25 

-16 

3 

24 

-12 

4 

23 

-8 

5 

22 

-6 

6 

21 

-3 

7 

20 

-1 

8 

19 

1 

9 

18 

3 

10 

17 

4 

11 

16 

5 

12 

15 

6 

13 

14 

6 


Figure 6 . Scores in half decibans of the various slides. 

Evaluating this crib by means of this table we score 

6 + 3- 3- 6- 16 - 20 - 8 + 5 + 6( =-33 ), 

i.e. the crib is worse by a factor of 10“®®/^® than it was before e.g. if the a priori 
odds of the crib were 2:1 against it becomes 98:1 against. This crib was in fact 
made up at random i.e. the letters of the cipher text were chosen at random. 


No such discussion appears in the manuscript. 
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Now let US take one made up correctly, i.e. really enciphered by the method in 


question, but with a random chosen key. 

N Y X L N 

X 

I 

Q 

H 

H 

A M B A S 

S 

A 

D 

0 

R 

13 12 22 11 21 

5 

8 

13 

19 

16 


(slides) 

This scores 15 so that if it were originally 2:1 against, it now becomes nearly 3:1 on. 

Having decided on a crib the natural way to test it is to have a catalogue of the 
positions in which a given series of slides is obtained if the 91 period component 
is omitted. We make 91 different hypotheses as to this third component, draw an 
inference as to what is the part of the slide arising from the components of periods 
95 and 99 combined. This we look up in the catalogue. This process is fairly 
lengthy, and as the scoring of the crib takes only a minute it is certainly worth 
doing. 

2.3. Theory of repeats 

Suppose we have a cipher in which there are several very long series of substi¬ 
tutions which can be used for enciphering a message, but that one may sometimes 
get two messages enciphered with the same series of substitutions (or possibly, the 
series of substitutions for one message being those for another with some at the 
beginning omitted). In such a case let us say that the messages fit, or that they 
fit at such and such a distance, the distance being the number of substitutions 
which have to be omitted from the one series to obtain the other series. One will 
frequently want to know whether two messages fit or not, and we may find some 
evidence about this by examining the repeats between them. 

By the repeats between them I mean this. One writes out the cipher texts of 
the two messages with the letters which are thought to have been enciphered with 
the same substitution under one another. One then writes under these messages 
a series of letters 0 and X, an 0 being written where the cipher texts differ and 
an X where they agree. The series of letters 0 and X will begin where the second 
message begins and end where the first to end ends. This series of letters 0 and X 
may be called the repetition figure. It may be completed by adding at the ends an 
indication of how many letters there are which do not overlap, and which message 
they belong to. 

As an example: 

GFRLIKQGVBMILAFIXMMORDGBYSKYXDAZCHMUMRKBZLDLDDOHCMVTIPRSD 

VLOVDYQCEJSOPYGBMBKYXDAZNBFIOPTFCXDDD 

®X0000DD0000X0DXX00XXXXXX000DD00000XDX^1 

On the whole one expects that a fit is more likely to be right the more letters 
X there are in the repetition figure, and that long series of letters X are especially 
desirable. This is because it would not be very unusual for two fairly common 
words to lie directly under one another when the clear texts are written out, thus 

THEMAINCONVOYWILLARRIVE ... 

ALLCONVOYSMUSTREPORT ... 

XDDXXXXXXDOOOOXDDOO ... 
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If the corresponding cipher texts really fit, i.e. if the letters in the same col¬ 
umn are enciphered with the same substitution, then the condition for an X in the 
repetition figure of the cipher texts is that there be an X in the repetition figure of 
the corresponding clear text. Now series of several consecutive letters X can occur 
quite easily as above by two identical words coming under one another, or by such 
combinations as 

ITISEASIERTOTEACHTHANALGEBRA ... 

THERAINWASSUCHTHATHECDULD ... 

DOOOOOOOOOOOXXXXXOOOODDOO ... 

if the messages really fit, but if not they can only occur by complete coincidence. 
One therefore tends to believe that there is a fit when one gets such series of letters X. 
As regards single cases of X the value of them is not so clear, but one can see that 
if Va is the frequency of letters a in plain language then the frequency of letters X 
as a whole in comparison of plain language with plain language is whilst 

for wrong fits of cipher text it is 1 /26 which is necessarily less. Given a sufficiently 
long repetition figure one should therefore be able to tell whether it is a fit or not 
simply by counting the letters X and 0. 

So much is well known. The real point of this section is to show these ideas 
can be developed into an accurate method of estimating the probabilities of fits. 

2.3.1. Simple form of theory. The complete theory takes account of the 
various possible lengths of repeat. As this theory is somewhat complicated it will 
be as well to give first two simplified forms of the theory. In both cases the sim¬ 
plification arises by neglecting a part of the evidence. In the first simplified form 
of theory we neglect all evidence except the number of letters X and the number of 
letters 0. In the other simplified form the evidence is the number of series of (say) 
four consecutive letters X in a repetition figure. 

When our evidence is just the number of times X occurs in the repetition figure, 
(n let us say) and the length of the repetition figure [N say), then the factor in 
favour of the fit is 

Probability of a right repetition figure of length N and n oecurrenees of X 
Probability of a wrong repetition figure of length N having n occurrences of X 

As an approximation we may assume that the numerator of this expression has 
the same value as if the right repetition figures were produced letter by letter by 
independent random choices, with a certain fixed probability of getting an X at each 
stage. This probability will have to be /? = J2a '^a- The numerator is then 

{Number of repetition patterns with length N and n occurrences of X) 

X {Probability of getting a given such repetition pattern by the process just mentioned ), 

which we may write as R{N,n)Q{N,n). Now let us denote by pi the ith symbol 
of the given repetition pattern and put Tx = P and tq = 1 — /3. Then Q{N,n), 
the probability of getting the repetition pattern is Hti '’'vi which simplifies to 
/3"(1 — /3)'^“". We may do a similar calculation for the denominator, but here we 
must take fi = 1/26 since all letters occur equally frequently in the cipher. The 
denominator is then 



R{N, n) 


N-n 
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In dividing to find the factor for the fit R{N, n) cancels out, leaving 

/26 \N-n 

(26/3)” (1-/3) j . 

In other words we score a factor of 26/3 for an X and a factor of (26/25) (1 — /3) for 
an 0. More convenient is to regard it as 101ogio[(25/3)/(l — /3)] decibans for an X 
and 10 log]^Q[(26/25)/(l — /3)] per unit length of repetition figure {per unit overlap). 

An alternative argument, leading to the same result, runs as follows. Having 
decided to neglect all evidence except the overlap and the number of repeats we 
pretend that nothing else matters, i.e. that the form of the figure is irrelevant. In 
this case we can regard each letter of the repetition figure as independent evidence 
about the fit. If we get an X the factor for the fit is 

Probability of getting an X if the fit is right 
Probability of getting an X if the fit is wrong’ 
i.e. /3/(l/26). Similarly the factor for an 0 is (1 — /3)/(25/26). 

In either form of argument it is unnecessary to calculate the number R{N,n). 
In this particular case there is no particular difficulty about about it: it is the 
binomial coefficient. In some similar problems this cancelling out is a great boon, 
as we might not be able to find any simple form for the factor which cancels. The 
cancelling out is a normal feature of this kind of problem, and it seems quite natural 
that it should happen when we think of the second form of argument in which we 
think of the evidence as consisting of a number of independent parts. 

The device of assuming, as we have done here, that the evidence which is not 
available is irrelevant can often be used and usually leads to good results. It is of 
course not supposed that the evidence really is irrelevant, but only that the error 
resulting from the assumption when used in this kind of way is likely to be small. 

2.3.2. Second simplified form of theory. In the second simplified form of 
theory we take as our evidence that a particular part of the repetition figure is 
OXXXXO (say, or alternatively OXXXXXO say). The factor is then 

Frequency of OXXXXO in right repetition figures 
Frequency of OXXXXO in wrong repetition figures 
The denominator is 



and the numerator may be estimated by taking a sample of language hexagrams 
and counting the number of pairs that have the repetition figure OXXXXO. The 
expectation of the number of such pairs is the sum for all pairs of the probabilities 
of those pairs having the desired repetition figure i.e. is the number of such pairs 
{viz N{N — l)/2 where N is the size of the sample) multiplied by the frequency of 
OXXXXO repetition figures. This frequency may therefore be obtained by division if 
we equate the expected number of these repetition figures to the actual number. 
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2.3.3. General form of theory. It is not of course possible to have statistics 
of every conceivable repetition figure. We must make some assumptions to reduce 
the variety that need to be considered. The following assumption is theoretically 
very convenient, and also appears to be a very good approximation. 

The probability of repeats at two points known to be separated by a point where 
there is known to be no repeat are independent. 

We may also assume that the probability of a repeat is independent of anything 
but the repetition figure in this neighbourhood. (We may however as a refinement 
produce different positions in a message). We can therefore think of repetition 
figures as being produced by selecting the symbols of the hgure consecutively, the 
probability of getting an X at each stage being determined by the repetition hgure 
from the point in question back as far as the last 0. Sometimes this will take us 
back as far as the beginning of the message, and will include the number telling us 
how many more letters there are which do not repeat at all. We need in practice 
only distinguish two cases, where this number is 0 and when it is more. We may 
also neglect the question as to which message occurs hrst. We therefore have to 
distinguish the following cases 


0 

Qq 

some 

bo 

none 

Co 

ox 

ai 

some X 

bi 

none X 

Cl 

oxx 

02 

some XX 

b2 

none XX 

C2 

oxxx 

03 

some XXX 

bo 

none XXX 

C3 


The entries oq, oi, 6o, etc. opposite the repetition hgures are the notations we are 
adopting for the probability of getting another X following such a figure. Strictly 
speaking we should also bring in a notation for the probability of the message 
coming to an end after any given repetition figure. As the repeats at the end of a 
comparison do not appear to behave very differently from those in the main part 
of the message I shall neglect this complication by assuming that the probability 
of getting an 0 added to the probability of getting an X is 1, and that afterwards 
one cuts off the end of the series arbitrarily. 

Let us calculate the factor for the repeat hgur^ 


none X 

X 

X 

X 

0 

0 

0 

X 

Co 

Cl 

Co 

Co 

1-C, 

i-a„ 

l-a„ 

a„ 

1/26 

1/26 

1/26 

1/26 

25/26 

25/26 

25/26 

1/26 


0 

X 

X 

X 

0 

0 

X 


1 Sg 

25/26 1/26 1/26 


Ct2 1 ~^3 1 ~^0 

1/26 25/26 25/26 


ao 

1/26 


X 


some 


aj 

1/26 


Underneath each symbol has been written the probability that one would get 
that symbol, knowing the ones which precede, both for the case of a right and of a 
wrong repetition hgure. The factor for the ht is the product of the hrst row divided 

® In the manuscript, Turing squeezes the figure into three lines by spilling into the margins 
and use of pen and ink. The typeset equivalent is unreadable, so the figure has been split into a 
left and right components. 

Reassemble as: none XXXXQ|0|0|XD|XXX0|Q|XX| some 
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by the product of the second. It is convenient to split this up as indicated by the 
vertical lines into the product of 


C0C1C2C3 (1 — C4) 
(1/26)4 X (25/26)’ 
1 - ao 


(25/26)’ 

ao(l - oi) 

(1/26) X (25/26)’ 

000102 (1 - 03) 

(1/26)3 X (25/26)’ 


- occurring three times, 


OqOi 

(1/26)2’ 


and this product may be put into the form of the product of 
C0C1C2C3 (1 - C4) / 1 - Op 

(l/26)'‘ X (25/26) ^ V (25/26)/ ’ 

- which we call the factor for an 
initial tetragramme repeat level, 

ao(l - oi) / 1 - Op \ 

(1/26) X (25/26) ^ ^(25/26)/ ’ 

- the factor for a single repeat, 

O0O1O2 (1 - 03) / 1 - Op \ 

(1/26)^ X (25/26) ^ U25/26)y ’ 

- the factor for a trigramme, 

1 - gp 
1 — 02 ’ 

- the correction for a final bigramme, 

i-gp 

(25/26) j ’ 


- the factor for an overlap of 16, 

opoi (1 - 02 ) / 1 - Op 

(1/26)^ X (25/26) ^ V (25/26)/ ’ 


- the factor for a trigramme. 

We shall neglect the correction for a final bigramme (or whatever it may be). 
It is in any case rather small, and vanishes if the repetition figure ends with 0 ; also 
with our conventions the whole question of the ends of repetition figures has been 
left rather in doubt. 



Now let us pulp^ 
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aoUi . . . Oi.(l — Ur+l) = kr, 

bobl • ■ • &r(l ^r+l) — jr^ 

CqCi . . . = ij-. 

The values of the ir can be obtained as follows. We take a number of plain language 
messages and leave out two or three words at the beginning. Then combine the 
messages to form one long message; this message may be made to eat its own 
tail i.e. it may be written round a circle. If the message were compared with 
itself in every possible position, except level, we should expect to get repetition 
figures which when divided up as shown by vertical lines after each 0 , containing 
{N{N — l)/2)kr (= Nr) parts which consist of r symbols 0, or as we may say W 
actual r-gramme repeats, where h is the probability of an 0 . 

The values of Nr can be calculated given the apparent number of r-gramme 
repeats Mr for each r. This apparent number of r-gramme repeats is the number 
of series of r consecutive symbols X in the repetition figures regardless of what 
precedes or follows the series. 

By considering the ways in which an actual repeat can give rise to the apparent 
repeat of various lengths we see that 

Mr = Nr -\- 2Nr+l “b 3Nr-\-2 “b . . . , 

and therefore 

Mr — Mr+l = Nr "b Nr-^-l “b Nr-^-2 “b . . . , 

and 

{Mr — Mr+l) — {Mr+l — Mr+2) = Nr- 

The calculation of jr may perhaps best be done by comparing the beginners of 
a number of messages with the long circular message, and the values of v by 
comparing the beginners among themselves. A similar technique of actual and 
apparent numbers of repeats can be used. I shall not go into this in detail. The 
formulae required may now be assembled. 

p,r = decibanage for an r-gramme repeat, 

7 = negative decibanage for unit overlap, 

Sp^r = number of occurrences in the statistics of the r-gramme ft, 

N = total number of letters in the statistics. 


Then if 


Mr 


E 


{^P,r 


1 ) 


Nr = Mr- 2Mr+l + Mr+2, 

N{N-1) 

2 

k = — 

Lh' 


The manuscript has a pencilled note beside kr indicating it is to be read as kr+i- We 
presume that this also means that jr should be jr+l, and ir should be ir+i, However, these are 
not indicated and no changes are made in the subsequent text. We leave the text unchanged 
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h may be calculated as follows. From the identity 

(1 — Oq) + Oq (1 — Ui) + aoOi (1 — 02 ) + • • • = 1, 

we get 

fco + fcl + ^2 + ■ ■ ' = 1 ; 


= 1 , 


(1 - Oo) = ko = 

= 10 logio 


L- Ml 
Lh 

Nq L — ‘2Mi M 2 
L- Ml ^ ■ L -Mi 
f2Q^+^kr , , 

+ (r + 1 ) i/, 


V 25 

/26(l-ao) 


-10 logic ( 


25 


2.4. Transposition ciphers 

2.4.1. A probability problem. In making calculations about substitution 
ciphers we have often found it useful to treat the plain language as if it were pro¬ 
duced by independent choices for the letters, using certain fixed frequencies with 
which the letters are chosen. Our method for Vigenere and one of the simplified 
forms of repeat theory could be based on this sort of assumption. With a transpo¬ 
sition cipher however such an assumption would be useless or worse than useless, 
for it would result in the conclusion that all transpositions were equally likely. We 
have therefore to take a slightly less crude assumption, and the one which suggests 
itself is that the letters forming the plain language are chosen consecutively, the 
probability of getting a particular letter depending only on what the letter is and 
what the preceding letter was. It is easily verified the if Va^ is the proportion of 
bigrammes a /3 in plain language and Va the frequency of the letter a then the 
probability Qa/s of a letter /3 following an a is Vap/Va- The probability of a piece 
of plain language of length L letters saying aia2 . ■ .aL is then 

Tq,;, X Qa^a.2 ^ Q0L2OL3 X X * * * X t 

which may also be written as 

J (ai,..., Ol) . 

We may also calculate the probability of a given piece of plain language having 
certain given letters in given places, the remainder of the message being unspecified. 
The probability is given by 

^ (?i> ■ • ■) Cl consistent with data ) J' (Ci, ■ • ■, Cl) , 
and if the data is that the known letters are 


••• / 3 i 

ni dots 

it is approximately^] 


• • • P2 . Pr-l ■ ■ ■ Pr 

712 dots n-r dots 




n 


'Ppr'Ppr+l 


( 1 ) 

( 2 ) 


The manuscript has as the first term ^ pencilled annotation indicates that the (3r 

is to be read as This substitution has been made in the text. 
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A more or less rigorous deduction of this approximation from the assumptions above 
is given at the end of the section. For the present let us see how it can be applied. 
If we have two theories about the transposition of which the one requires the above 
pattern of letters, and the other brings the same letters in to positions in which no 
two of them are consecutive, then the factor in favour of the first as compared with 
the second is 


n 




We can apply this straightforwardly to the case of a simple transposition by columns. 
The following text is known to be a simple transposition of a certain type of German 
text with a key length of not more than 15p| 


SATPTWSFASTAUTEEAIEUFHWTJTDDGC 
NLTSEFCUIEBOEYQHGTJTEEFIEORTAR 
URNLNNNNAIEOTUSHLESBFBRNDXGNJH 
U A N W R 


To solve this transposition, we may try comparing the first six letters of 
S A T P T W which we know form part of one column with each other series of 
six letters in the message, for we know that one such comparison will give entirely 
bigrammes occurring in the decode. We may try first 

S F 
A A 
T S 
P T 
T A 
W U 

The factor for a transposition which brings these letters together, as compared 
with one which leaves them apart is 

VsE Vaa Vwu 

- X -X • • • X -. 

VsVf VaVa 'Pw'Pu 

By using a table of values of 

made up for the type of traffic in question, and given to the nearest integer (table of 
values of 'Pa^/i'Pa'Pp) expressed in half-decibans) we get the product by addition. 
Such a table is shown in Fig 6. The scores for this particular columns are SF 
-7, AA -7, TS -2, PT -10, TA -3, WU -13, totalling -36. If we consider this 
combination as a priori about 100:1 against (there are 95 letters in the message) it 
is a posteriori about 3000:1 against. 


As for the Vigenere problem above, Turing’s statement of the ciphertext is slightly different 
from that which he scores for decryption. The second line in the ciphertext below begins NLTS, 
however, this changes to NITS in the scoring example in Figure 7 below. See also the notes 
accompanying the cleartext. 
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Figure 6. Exclusive bigramme scores in half decibans, i.e. 201og]^o \ v % is )'> ^ certain kind of German traffic. 
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Figure 7. Scoring the matching of columns in a simple transpo¬ 
sition. Correct matchings noted. 
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Similar scoring may be done for every possible comparison of S A T P T W 
with six consecutive letters of the message. The comparison may be made both 
with S A T P T W as earlier and as later column; one may also use the last six 
letters of the message H U A N W R. 

The results of doing this are shown in Fig 7. The message has been written 
out vertically. The first columns of figures after the message gives the score for 
S A T P T W as earlier column, entered against the first letter of the later column, 
e.g. the -36 as calculated above gets entered against the F of F A S T A U. The 
second column after the message consists of the scores for H U A N W R as first 
column [^d the column before the message gives scores for H U A N W R as second 
column]!^ One of these columns has been worked out in detail but in the other 
two crosses have been put in where the scores are very bad. 

The scores which eventually turned to to be right are ringed. The fourth 
comparison, which did not have to be done scored very badly viz. -27. Amongst 
the good scores which were wrong there was one score of 37. It was not difficult to 
see that this one was wrong as most of the score came from W 0 with requires Z to 
precede it, and there was no Z in the message. Apart from this fact the comparison 
was about evens, although if we take into account the fact that there was no better 
score it would be betterj^ [We have already had a case of this kind of thing in 
connection with Vigenere; if the various positions are a priori equally likely and 
the factors are /i, /2,..., /at then the value fr/J2fi for the probability of the rth 
alternative is better than {fr/N) / (1 -F fr/N)]. 


2.4.2. The Probability Formula. (Semi) Rigorous deduction of the formula 
([^ on page (This is something of a digression). 

The probability of a piece of plain language coinciding where necessary with 
the data Q on page 20 


IS 


where 


since 


/3l7)t2,/3i/32'^3./32/33 ■ • ■ J 

'Tn.ajH is ^ ' 9a7)i9r)ir;2 • ■ • 

VlVl-.-Vn 

'y ~'Ppi- 

We can put 

where Q is the matrix whose a/3 coefficient is qap. The formula ([^ on page 
would then be accurate if we could say that for n > 0, 

= Vf. 

This is not true, but it is true that except for very special values for 

( 2 ")a /3 as n —)■ 00 , 


20 


. and the column to end of sentence, has the note in pencil: I doubt it - S. W. 

Using Turing’s scoring recommendations and a key length of 12 with sequence 5, 11, 8, 
7, 3, 10, 6, 12, 9, 4, 1, 2, a cleartext emerges: BNTO SJJ ALBA RFJ STATT IN GST B HEUTE DEN 
ETA RUFS PEDUNYAR NACHT FGFNQUUDNUL WICH AHTR X WIESEN WI GEN GRESFQITE TE. With Tur¬ 
ing’s original statement of the ciphertext, as noted above, GRESFOITE becomes GRESFOLTE. Turing 
scores for the I and not for L, although it makes no differences in the decision to align bigrams. 
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and the convergence is rather rapid. 

To prove this I shall assume that the eigenvalues of Q are all different in mod¬ 
ulus. In this case we can find a matrix U with unit determinant, such that U~^Q U 
is in the diagonal form 


M =U-^QU 


/ Ml 0 0 

0 M2 0 

0 ■■■ 


• 0 M25 

V 0 ••• 0 0 

since QJU —UM we have 

7 ? 


0 \ 

0 

0 

M26 / 


i.e. 


E 




1 ^ 131 ^ 0 .^’ 


7 

That is, for each 13, Uap provides a solution of 


^ ' Qa'yl'Y — M^ ct: (3) 

7 

with fj, = fijs. Conversely if we have any solution of ([^ then fj, = la = kua^ for 

some k, i9 and all a, for as is non singular we can find numbers such that 

la = Ua-yCj for all a, 

7 

and then substituting in (|^ we get 

^ ^ Qa^ '^■yS 0(5 — M ^ ^ ^aS^S ^ 

7,(5 S 


i.e. 

{l^SCS - lies) UaS = 0. 

(5 

Which, since U is nonsingular implies ii = fis or cs = 0 for all S. 

As the series m 1 ) • ■ • i M 26 are all different there is only one value i? of 5 for which 
M = M5 and so la = c^Ua-d for all a. Now putting ?« = 1 for all a we see that one 
member of the series mij • ■ • j M 26 is 1, for ([^ is certainly satisfied. 

I shall prove that the remaining eigenvalues satisfy |m| < 1- We first prove that 
if M 7 ^ 1 then J2Pala = 0. This follows by multiplying ^ on each side by Va and 
summing. Since 

ga/3 = ^ and Yi '^<^3 = '^3^ 

r oi 

a. 

we get 

E/ = E P.yl'y jJi E Pal a 7 

0:7 

which implies 

M = 1 or 'Y^Pala — 0. 
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Next we show that each /i for which |/r| > 1 is real and positive. Let la satisfy 
^ with |/r| > 1; then the eigenvalue for la is ]I and so 

E (S')-/? (1 +Hh+ h)) = 1 + 2e 5R ^iUa. 

/3 

If £ > 0 has been chosen so small that IRelp > —1/2 for all /? then the L.H.S. is 
positive for the coefficients in the matrix are positive, whereas the R.H.S. is negative 
for suitably chosen /r, unless la = 0. If now ^ > 1 we may take it that la is real for 
each a. As it must satisfy ^Pa^a = 0 it is negative for some a, but then 

(2’')a/3 (1 + e ^;S) = 1 + 

P 

and if e is chosen so that 1 + e Ip > 0 for all /3 the L.H.S is positive whereas the 
R.H.S is negative for sufficiently large r. 

All the eigenvalues therefore satisfy |/r| < 1 as the eigenvalues are all different 
in modulus this means that \fi\ < 1 except for one value of p. Then as r —> 
tends to a matrix which has only one element different from 0, and that a 1 on the 
diagonal, say in position aa. 

Calling this matrix the series of matrices Q'' tends to the matrix IA~^XJA. 
This matrix is the one and only one y which satisfies yQ = y,y^ = y^y ^ ^ and 
is therefore the one whose a/3 coefficient is Vp. 


2.4.3. Another probability problem. There is another probability problem 
that arises in connection with simple transpositions. With a message of length L, 
and a key length of K what is the probability that the mth letter will be at the 
bottom of a column? Let D be the length of the short columns i.e. D — [L/K], 
and \et E = L — DK. Then if the mth letter is at the bottom of the wth column 
we must have 

m m 


D + 1 


< w < —, 


D’ 


and there will be {D + l)i(; — m short and m — Dw long columns among these first 
w columns. There ariP^ 


/ w K — w \ 

\m — Dw) \E — m + DwJ 

ways in which the short and long columns can be arranged consistently with this, 
and altogether ways in which the columns can be arranged, so that the prob¬ 
ability of the m the letter being at the bottom of a column is 


E 

{m/D-\-l)<w<(m/D) 


W 

m — Dw 


K — w 

E — m + Dw 



There will normally be very few terms in the sum. 

Let us take the case of the message of length 133 and consider the 45th letter, 
assuming the key length is between 10 and 20 (inclusive). Lq^■ L = 133, m = 45. 


15 


Turing is using Binomial Coefficient notation in this section; 

r)=Cin,k) = ^^= ^ 

P{k,k) {n — k)\k\ 






2.4. TRANSPOSITION CIPHERS 


27 


K = 10, 

D = 13, 

E = 3, 

m 

= 3+, 

m 

D=^^ 

D^l 

K = 11, 

D = 12, 

E=l, 

m 

= 3+, 

m 

D=^^ 

D^l 

K = 12, 

D = 11, 

E = l, 

m 

= 3+, 

m 

D=^^ 

D + l 

only terms w = 4, 

m — Dw 

= 1 


probability is. 

K = 13, 

D = 10, 

E = 3, 

m 

= 4+, 

m 

D=^^ 

D^l 

K = 14, 

D = 9, 

E = 7, 

m 

= 4+, 

m 

D=^ 

D + l~ 


only terms w = 5, m — Dw = 0 probability is: 

m m 

K = 15, D = 8 , E = 13, -=5, — = 5+ 

’ ’ ’ D + 1 ’ D 


only terms w = 5, m — Dw = 5 


probability is: 


m m 

K = \5, D = 8 , E = 5, -^^ = 5+, = 5+ 


only terms w = 5, m — Dw = 5 probability is: 

m m 

K = 17, D = 7, E = U, -^=5+, ^=6+ 


only terms w = 6 , m — Dw = 3 


probability is: 


m m 

if = 18, D = 7, E = 7, -^^ = 5+, ;^=6+ 

only terms w = 6 , m — Dw = 3 probability is: 

K = 19, D = 7, E = 0, probability is: 

m m 

if = 20, D = Q, E = 18, ;^=6+, ;o=7+ 

only terms w = 7, m — Dw = 3 probability is: 


no terms 
no terms 



{Editor - 1/34 is 0.0294.) 



4950 

15912 


0.311, 



35 X 143 
15504 


0.323. 


oo 



